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Abstract 

In this paper, we present the first results of our on¬ 
going early-stage research on a realtime disaster de¬ 
tection and monitoring tool. Based on Wikipedia, it is 
language-agnostic and leverages user-generated multi- 
media content shared on online social networking sites 
to help disaster responders prioritize their efforts. We 
make the tool and its source code publicly available as 
we make progress on it. Furthermore, we strive to pub¬ 
lish detected disasters and accompanying multimedia 
content following the Linked Data principles to facili¬ 
tate its wide consumption, redistribution, and evaluation 
of its usefulness. 


to strengthen the UN’s response to complex emergencies 
and disasters. The Global Disaster Alert and Coordination 
System (GDACSQ is “a cooperation framework between 
the United Nations, the European Commission, and dis¬ 
aster managers worldwide to improve alerts, information 
exchange, and coordination in the first phase after major 
sudden-onset disasters.” Global companies like Facebookj^] 
AirbnbQ or GoogleQ have dedicated crisis response teams 
that work on making critical emergency information acces¬ 
sible in times of disaster. As can be seen from the (incom- 
prehensive) list above, disaster detection and response is 
a problem tackled on national, international, and global lev¬ 
els; both from the public and private sectors. 


1 Introduction 


1.2 Hypotheses and Research Questions 


1.1 Disaster Monitoring: A Global Challenge 


According to a study (Laframboise and Loko 2012) pub¬ 


lished by the International Monetary Fund (IMF), about 
700 disasters were registered worldwide between 2010 and 
2012, affecting more than 450 million people. According to 
the study, “[djamages have risen from an estimated US$20 
billion on average per year in the 1990s to about US$100 
billion per year during 2000-10.” The authors expect this 
upward trend to continue “as a result of the rising concentra¬ 
tion of people living in areas more exposed to disasters, and 
climate change.” In consequence, disaster monitoring will 
become more and more crucial in the future. 

National agencies like the Federal Emergency Manage¬ 
ment Agency (FEMA £] in the United States of America or 
the Bundesamt fiir Bevdlkerungsschutz und Katastrophen- 
hilfe (BBK0 “Federal Office of Civil Protection and Disas¬ 
ter Assistance”) in Germany work to ensure the safety of 
the population on a national level, combining and provid¬ 
ing relevant tasks and information in a single place. The 
United Nations Office for the Coordination of Humanitar¬ 
ian Affairs (OCHAj^jis a United Nations (UN) body formed 
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In this paper, we present the first results of our on¬ 
going early-stage research on a realtime comprehensive 
Wikipedia-based monitoring system for the detection of dis¬ 
asters around the globe. This system is language-agnostic 
and leverages multimedia content shared on online social 
networking sites, striving to help disaster responders priori¬ 
tize their efforts. Structured data about detected disasters is 
made available in the form of Linked Data to facilitate its 
consumption. An earlier version of this paper without the 
focus on multimedia content from online social networking 
sites and Linked Data was published in ( Steiner 2014b| . For 
the present and further extended work, we are steered by the 
following hypotheses. 


HI Content about disasters gets added very fast to Wiki¬ 
pedia and online social networking sites by people in the 
neighborhood of the event. 

H2 Disasters being geographically constrained, textual and 
multimedia content about them on Wikipedia and social 
networking sites appear first in local language, perhaps 
only later in English. 


4 GDACS: http://www.gdacs.org/ 

5 Facebook Disaster Relief: 

https://www.facebook.com/DisasterRetief 
6 Airbnb Disaster Response: 

https://www.airbnb.com/disaster- response 
7 Google Crisis Response: 

https://www.google.org/crisisresponse/ 










H3 Link structure dynamics of Wikipedia provide for 
a meaningful way to detect future disasters, i.e., disasters 
unknown at system creation time. 

These hypotheses lead us to the following research questions 
that we strive to answer in the near future. 


concurrent Wikipedia edits and auto-generates related multi- 
media galleries based on content from various OSN sites and 
Wikimedia C o m m o n s ["] P i n a 11 y, [Lin and Mishne (2012) ex¬ 
amine realtime search query churn on Twitter, including in 
the context of disasters. 


Q1 How timely and accurate is content from Wikipedia and 
online social networking sites for the purpose of disaster 
detection and ongoing monitoring, compared to content 
from authoritative and government sources? 

Q2 To what extent can the disambiguated nature of Wiki¬ 
pedia (things identified by URIs) improve on keyword- 
based disaster detection approaches, e.g., via online social 
network sites or search logs? 

Q3 How much noise is introduced by full-text searches 
(which are not based on disambiguated URIs) for mul¬ 
timedia content on online social networking sites? 

The remainder of the article is structured as follows. First 
we discuss related work and enabling technologies in the 
next section, followed by our methodology in ??. We de¬ 
scribe an evaluation strategy in ??, and finally conclude with 
an outlook on future work in ??. 


2 Related Work and Enabling Technologies 

2.1 Disaster Detection 


Digitally crowdsourced data for disaster detection and re¬ 
sponse has gained momentum in recent years, as the Internet 
has proven resilient in times of crises, compared to other in¬ 
frastructure. Ryan Falor, Crisis Response Product Manager 
at Google in 2011, remarks in (Falor 20TT| ) that “a substan¬ 
tial [... ] proportion of searches are directly related to the 
crises; and people continue to search and access informa¬ 
tion online even while traffic and search levels drop tem¬ 
porarily during and immediately following the crises.” In 
the following, we provide a non-exhaustive list of related 
work on digitally crowdsourced disaster detection and re¬ 
sponse. [Sakaki, Okazaki, and Matsuo] ( 2010] | consider each 
user of the online social networking (OSN) site Twittei^] 
a sensor for the purpose of earthquake detection in Japan. 
|Goodchild and Glennon| ( |2010[ ) show how crowdsourced 
geodata from Wikipedia and Wikimapia]^] “a multilingual 
open-content collaborative map”, can help complete author¬ 
itative data about disasters. |Abel et ak| ( |2012| > describe a cri¬ 
sis monitoring system that extracts relevant content about 
known disasters from Twitter. |Liu et aI7| ( [2008] ) examine 
common patterns and norms of disaster coverage on the 
photo sharing site Flic kip 3 ] Ortmann et al. (2011 1 propose 
to crowdsource Linked Open Data for disaster management 
and also provide a good overview on well-known crowd¬ 
sourcing tools like Google Map Makerf"~|OpenStreetMapf"1 
and Ushahidi ( Okolloh 2009| l. We have developed a moni¬ 
toring system (Steiner 2014c) that detects news events from 


^Twitter: https://twitter.com/ 

9 Wikimapia: http://wikimapia.org/ 

10 Flickr: https://www.flickr.com/ 

"Google Map Maker: http://www.google.com/mapmaker 
"OpenStreetMap: http://www.openstreetmap.org/ 


2.2 The Common Alerting Protocol 

To facilitate collaboration, a common protocol is essential. 
The Common Alerting Protocol (CAP) ( jWestfall 2010} ) is 
an XML-based general data format for exchanging public 
warnings and emergencies between alerting technologies. 
CAP allows a warning message to be consistently dissem¬ 
inated simultaneously over many warning systems to many 
applications. The protocol increases warning effectiveness 
and simplifies the task of activating a warning for officials. 
CAP also provides the capability to include multimedia data, 
such as photos, maps, or videos. Alerts can be geographi¬ 
cally targeted to a defined warning area. An exemplary flood 
warning CAP feed stemming from GDACS is shown in List¬ 
ing □ The step from trees to graphs can be taken through 
Linked Data, which we introduce in the next section. 


2.3 Linked Data and Linked Data Principles 

Linked Data ( jBerners-Lee 2006| defines a set of agreed-on 
best practices and principles for interconnecting and pub¬ 
lishing structured data on the Web. It uses Web technolo¬ 
gies like the Hypertext Transfer Protocol (HTTP, [Fielding et 


al. 1999|> and Unique Resource Identifiers (URIs, Berners- 


Lee, Fielding, and Masinter 2005 1) to create typed links be¬ 


tween different sources. The portal http://linkeddata.org/ 
defines Linked Data as being “about using the Web to con¬ 
nect related data that wasn’t previously linked, or using the 
Web to lower the barriers to linking data currently linked 
using other methods.” Tim Berners-Lee ( |2006| ) defined the 
four rules for Linked Data in a W3C Design Issue as follows: 


1. Use URIs as names for things. 

2. Use HTTP URIs so that people can look up those names. 

3. When someone looks up a URI, provide useful informa¬ 
tion, using the standards (RDF, SPARQL). 

4. Include links to other URIs, so that they can discover more 
things. 

Linked Data uses RDF ( )Klyne and Carroll 2004) ) to create 
typed links between things in the world. The result is often¬ 
times referred to as the Web of Data. RDF encodes state¬ 
ments about things in the form of (subject, predicate, 
object) triples. Heath and Bizer](j20TT]) speak of RDF links. 


2.4 Linked Data Fragments 

Various access mechanisms to Linked Data exist on the Web, 
each of which comes with its own trade-offs regarding query 
performance, freshness of data, and server cost/availability. 
To retrieve information about a specific subject, you can 
dereference its URL. SPARQL endpoints allow to execute 

13 Wikimedia Commons: https ://commons.wikimedia.org/ 






























































<alert xmlns="urn:oasis:names:tc:emergency:cap:1.2"> 

<identifier>GDACS_FL_4159_l</identifier> 

<sender>info@gdacs.org</sender> <sent>2014-07-14T23:59:59-00:00</sent> 

<status>Actual</status> <msgType>Alert</msgType> 

<scope>Public</scope> <incidents>4159</incidents> 

<info> 

<category>Geo</category><event>Flood</event> 

<urgency>Past</urgency><severity>Moderate</severity> 

<certainty>Unknown</certainty> 

<senderName>Global Disaster Alert and Coordination System</senderName> 

<headline /xdescription /> 

<web>\protect\vrule width0pt\protect\href{http://www.gdacs.org/reports.aspx?eventype=FL}{http://www.gdacs.org/reports.aspx?eventype=FL}& 
amp;amp;eventid=4159</web> 

<parameterxvalueName>eventid</valueNamexvalue>4159</valuex/parameter> 

<parameterxvalueName>currentepisodeid</valueNamexvalue>l</valuex/parameter> 

<parameterxvalueName>glide</valueName><value /x/parameter> 

<parameterxvalueName>version</valueNamexvalue>l</valuex/parameter> 

<parameterxvalueName>f romdate</valueNamexvalue>Wed, 21 May 2014 22:00:00 GMT</valuex/parameter> 
<parameterxvalueName>todate</valueName><value>Mon, 14 Jul 2014 21:59:59 GMT</valuex/parameter> 
<parameterxvalueName>eventtype</valueNamexvalue>FL</valuex/parameter> 
<parameterxvalueName>alertlevel</valueNamexvalue>Green</valuex/parameter> 
<parameterxvalueName>alerttype</valueNamexvalue>automatic</valuex/parameter> 

<parameterxvalueName>link</valueName><value>\protect\vrule width0pt\protect\href{http://www.gdacs.org/report.aspx?eventtype=FL}{http:// 
www.gdacs.o rg/report.aspx?eventtype=FL}&amp;amp;eventid=4159</valuex/parameter> 

<parameterxvalueName>count ry</valueNamexvalue>Brazil</valuex/parameter> 

<parameterxvalueName>eventname</valueNamexvalue /x/parameter> 

<parameterxvalueName>severity</valueNamexvalue>Magnitude 7.44</valuex/parameter> 

<parameterxvalueName>population</valueNamexvalue>0 killed and 0 displaced</valuex/parameter> 
<parameterxvalueName>vulnerability</valueName><value /x/parameter> 
<parameterxvalueName>sourceid</valueNamexvalue>DFO</valuex/parameter> 

<parameterxvalueName>iso3</valueName><value /x/parameter> 

<parameter> 

<valueName>hazardcomponents</valueNamexvalue>FL,dead=0,displaced=0,main_cause=Heavy Rain,severity=2,sqkm=256564.57</value> 
</parameter> 

<parameterxvalueName>datemodified</valueName><value>Mon, 01 Jan 0001 00:00:00 GMT</valuex/parameter> 
<areaxareaDesc>Polygon</areaDescxpolygon>,, 100</polygonx/area> 

</info> 

</alert> 


Listing 1: Common Alerting Protocol feed via the Global Disaster Alert and Coordination System (http: //www.gdacs .org/xml/ 
gdacs_cap. xml 2014-07-16) 


complex queries on RDF data, but they are not always avail¬ 
able. While endpoints are more convenient for clients, indi¬ 
vidual requests are considerably more expensive for servers. 
Alternatively, a data dump allows you to query locally. 

Linked Data Fragments ( Verborgh et al. 2014| provide 
a uniform view on all such possible interfaces to Linked 
Data, by describing each specific type of interface by the 
kind of fragments through which it allows access to the 
dataset. Each fragment consists of three parts: 

data all triples of this dataset that match a specific selector; 

metadata triples that describe the dataset and/or the Linked 
Data Fragment; 

controls hypermedia links and/or forms that lead to other 
Linked Data Fragments. 

This view allows to describe new interfaces with different 
trade-off combinations. One such interface is triple pattern 
fragments ( Verborgh et al. 2014[ >, which enables users to 
host Linked Data on low-cost servers with higher availability 


than public SPARQL endpoints. Such a light-weight mech¬ 
anism is ideal to expose live disaster monitoring data. 

3 Proposed Methodology 

3.1 Leveraging Wikipedia Link Structure 

Wikipedia is an international online encyclopedia currently 
available in 287 language^] with these characteristics: 

1. Articles in one language are interlinked with versions 
of the same article in other languages, e.g., the article 
“Natural disaster” on the English Wikipedia (http://en. 
wikipedia.org/wiki/Natural_disaster) links to 74 ver¬ 
sions of this article in different languagesp’j We note that 
there exist similarities and differences among Wikipedias 

14 A11 Wikipedias: http://meta.wikimedia.org/wiki/List_of_ 
Wikipedias 

l;, Article language links: http://en.wikipedia.Org/w/api.php? 
action=qjery&prop=Langlinks&niimit=max&titl.es=Natural_ 
disaster 











with “salient information” that is unique to each language 
as well as more widely shared facts ( ]Bao et al. 2012| . 

2. Each article can have redirects, i.e ., alternative URLs that 
point to the article. For the English “Natural disaster” ar¬ 
ticle, there are eight redirects] 16 ! e -8-- “Natural Hazard” 
(synonym), “Examples of natural disaster” (refinement), 
or “Natural disasters” (plural). 

3. For each article, the list of back links that link to the cur¬ 
rent article is available, i.e., inbound links other than redi¬ 
rects. The article “Natural disaster” has more than 500 ar¬ 
ticles that link to it^Likewise, the list of outbound links, 
i.e., other articles that the current article links to, is avail- 
ableP^I 

By combining an article’s in- and outbound links, we deter¬ 
mine the set of mutual links, i.e., the set of articles that the 
current article links to (outbound links) and at the same time 
receives links from (inbound links). 

3.2 Identification of Wikipedia Articles for 
Monitoring 

Starting with the well-curated English seed article “Natural 
disaster”, we programmatically follow each of the therein 
contained links of type “Main article:”, which leads to an ex¬ 
haustive list of English articles of concrete types of disasters, 
e.g., “Tsunami” (http://en.wikipedia.org/wiki/Tsunami), 
“Flood” (http://en.wikipedia.org/wiki/Flood), “Earth¬ 
quake” (http://en.wikipedia.org/wiki/Earthquake), etc. 
In total, we obtain links to 20 English articles about differ¬ 
ent types of disasters p^For each of these English disasters 
articles, we obtain all versions of each article in different 
languages [step (i) above], and of the resulting list of in¬ 
ternational articles in turn all their redirect URLs [step (ii) 
above]. The intermediate result is a complete list of all (cur¬ 
rently 1,270) articles in all Wikipedia languages and all their 
redirects that have any type of disaster as their subject. We 
call this list the “disasters list” and make it publicly avail¬ 
able in different formats (.txt, .tsv, and . json), where the 
JSON version is the most flexible and recommended onep 6 ] 
Finally, we obtain for each of the 1,270 articles in the “dis¬ 
asters list” all their back links, i.e., their inbound links [step 

l6 Article redirects: http://en.wikipedia.Org/w/ 

api.php?action=query&list=backlinks&blfilterredir= 
redirects&bllimit=max&bltitle=Natural_disaster 

‘'Article inbound links: http://en.wikipedia. 0 rg/w/api. 
php?action=query&list=backlinks&bllimit=max&blnamespace= 
0&bltitle=Natural_disaster 

ls Article outbound links: http://en.Wikipedia .org/w/api. php? 
action=query&prop=links&plnamespace=0&format=j son&pllimit= 
max&titles=Natural_disaster 

‘"“Avalanche”, “Blizzard". “Cyclone”, “Drought”, “Earth¬ 
quake”, “Epidemic”, “Extratropical cyclone”, “Flood”, “Gamma- 
ray burst”, “Hail”, “Heat wave”, “Impact event”, “Limnic erup¬ 
tion”, “Meteorological disaster”, “Solar flare”, “Tornado”, “Tropi¬ 
cal cyclone”, “Tsunami”, “Volcanic eruption”, “Wildfire” 

^“Disasters list”: https://github.com/tomayac/postdoc/ 

blob/master/papers/comprehensive-Wikipedia-monitoring - 
for-global-and-realtime-natural-disaster-detection/data/ 
disasters-list.json 


(iii) above], which serves to detect instances of disasters 
unknown at system creation time. For example, the article 
“Typhoon Rammasun (2014)” (http://en.wikipedia.org/ 
wiki/Typhoon_Rammasun_(2014) )—which, as a concrete in¬ 
stance of a disaster of type tropical cyclone, is not contained 
in our “disasters list”—links back to “Tropical cyclone” 
(http://en.wikipedia.org/wiki/Tropical_cyclone), so we 
can identify “Typhoon Rammasun (2014)” as related to 
tropical cyclones (but not necessarily identify as a tropi¬ 
cal cyclone), even if at the system’s creation time the ty¬ 
phoon did not exist yet. Analog to the inbound links, we 
obtain all outbound links of all articles in the “disasters 
list”, e.g., “Tropical cyclone” has an outbound link to “2014 
Pacific typhoon season” (http://en.wikipedia.org/wiki/ 
2014_Pacific_typhoon_season ), which also happens to be 
an inbound link of “Tropical cyclone”, so we have detected 
a mutual, circular link structure. Figure[l]shows the example 
in its entirety, starting from the seed level, to the disaster type 
level, to the in-/outbound link level. The end result is a large 
list called the “monitoring list” of all articles in all Wiki¬ 
pedia languages that are somehow—via a redirect, inbound, 
or outbound link (or resulting mutual link)—related to any 
of the articles in the “disasters list”. We make a snapshot of 
this dynamic “monitoring list” available for referencepijbut 
note that it will be out-of-date soon and should be regener¬ 
ated on a regular basis. The current version holds 141,001 
different articles. 


3.3 Monitoring Process 

In the past, we have worked on a Server-Sent Events (SSE) 
API (Steiner 2014a} capable of monitoring realtime editing 
activity on all language versions of Wikipedia. This API 
allows us to easily analyze Wikipedia edits by reacting on 
events fired by the API. Whenever an edit event occurs, we 
check if it is for one of the articles on our “monitoring list”. 
We keep track of the historic one-day-window editing activ¬ 
ity for each article on the “monitoring list” including their 
versions in other languages, and, upon a sudden spike of 
editing activity, trigger an alert about a potential new in¬ 
stance of a disaster type that the spiking article is an inbound 
or outbound link of (or both). To illustrate this, if, e.g., the 
German article “Pazifische Taifunsaison 2014” including all 
of its language links is spiking, we can infer that this is re¬ 
lated to a disaster of type “Tropical cyclone” due to the de¬ 
tected mutual link structure mentioned earlier (Figure[T]i. 

In order to detect spikes, we apply exponential smooth¬ 
ing to the last n edit intervals (we require n > 5) that oc¬ 
curred in the past 24 hours with a smoothing factor a = 0.5. 
The therefore required edit events are retrieved programmat¬ 
ically via the Wikipedia APlj^As a spike occurs when an 
edit interval gets “short enough” compared to historic edit- 


21 "Monitoring list”: https://github.com/tomayac/postdoc/ 
blob/master/papers/comprehensive-Wikipedia-monitoring-for- 
global-and-realtime-disaster-detection/data/monitoring - 
list.j son 

zz Wikipedia last revisions: http://en.wikipedia. 0 rg/w/ 
api.php?action=query&prop=revisions&rvlimit=6&rvprop= 
timestamp|user&titles=Typhoon_Rammasun_(2014) 
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Figure 1: Extracted Wikipedia link structure (tiny excerpt) starting from the seed article “Natural disaster” 


ing activity, we report a spike whenever the latest edit inter¬ 
val is shorter than half a standard deviation 0.5 x cr. 

A subset of all Wikipedia articles are geo-referenced p 2 ] 
so when we detect a spiking article, we try to obtain geo co¬ 
ordinates for the article itself (e.g., “Pazifische Taifunsaison 
2014”) or any of its language links that—as a consequence 
of the assumption in H2—may provide more local details 
(e.g., “2014 Pacific typhoon season” in English or “2014T 
in Chinese). We then calculate the center point 
of all obtained latitude/longitude pairs. 


3.4 Multimedia Content from Online Social 
Networking Sites 


In the past, we have worked on an application called So¬ 
da/ Media Illustrator (Steiner 2014cI that provides a so¬ 
cial multimedia search framework that enables searching for 
and extraction of multimedia data from the online social 
networking sites Google+rj Facebookp’ Twitter] 26 Insta- 


gram|^YouTube|^FhckrpqMobyPicture ^TwitPic ^]and 


Wikimedia Commons p 2 In a first step, it deduplicates exact- 
and near-duplicate social multimedia data based on a previ¬ 
ously describe algorithm (Steiner et al. 2013). It then ranks 
social multimedia data by social signals I fSteiner 2014c I 
based on an abstraction layer on top of the online social net¬ 
working sites mentioned above and, in a final step, allows 
for the creation of media galleries following aesthetic princi¬ 
ples ( Steiner 2014c | > of the two kinds Strict Order, Equal Size 
and Loose Order, Varying Size, defined in (Ste iner 2014c] l. 
We have ported crucial parts of the code of Social Media Il¬ 
lustrator from the client-side to the server-side, enabling us 
now to create media galleries at scale and on demand, based 
on the titles of spiking Wikipedia articles that are used as 
separate search terms for each language. The social media 
content therefore does not have to link to Wikipedia. One 
exemplary media gallery can be seen in Figure[2] each indi¬ 
vidual media item in the gallery is clickable and links back 
to the original post on the particular online social networking 
site, allowing crisis responders to monitor the media gallery 
as a whole, and to investigate interesting media items at the 
source and potentially get in contact with the originator. 


22 Article geo coordinates: http://en.wikipedia.Org/w/api. 
php?action=query&prop=coordinates&format=j son&colimit= 
max&coprop=dim|country|region|globe&coprimary=aVl&titles= 
September^ll_attacks 

”‘ t Google+: https : //plus . google. com/ 

25 Facebook: https ://www. facebook.com/ 

26 Twitter: https://twitter.com/ 

27 Instagram: http://instagram. com/ 

28 YouTube: http://www.youtube.com/ 

29 Flickr: http://www.flickr.com/ 

2,) MobyPicture: http://www.mobypicture. com/ 

21 TwitPic: http://twitpic.com/ 


3.5 Linked Data Publication 

In a final step, once a given confidence threshold has been 
reached and upon human inspection, we plan to send out 
a notification according to the Common Alerting Protocol 
following the format that (for GDACS) can be seen in List¬ 
ing E While Common Alerting Protocol messages are gen¬ 
erally well understood, additional synergies can be unlocked 
by leveraging Linked Data sources like DBpedia, Wikidata, 

22 Wikimedia Commons: http://commons.wikimedia.org/wiki/ 
Main_Page 
























and Freebase, and interlinking them with detected poten¬ 
tially relevant multimedia data from online social network¬ 
ing sites. Listing [2] shows an early-stage proposal for doing 
so. The alerts can be exposed as triple pattern fragments to 
enable live querying at low cost. This can also include push, 
pull, and streaming models, as Linked Data Fragments ([Ver- 


borgh et al. 2014) allow for all. A further approach consists 


in converting CAP messages to Linked Data by transform¬ 
ing the CAP extensible Markup Language (XML) format to 
Resource Description Format (RDF) and publishing it. 


3.6 Implementation Details 

We have created a publicly available prototypal demo appli¬ 
cation deployecp^| at http://disaster-monitor.herokuapp. 
com/ that internally connects to the SSE API from (Steiner 
|2014a] >. It is implemented in Node.js on the server, and as 
a JavaScript Web application on the client. This application 
uses an hourly refreshed version of the “monitoring list” 
from Section [L2| and whenever an edit event sent through 
the SSE API matches any of the articles in the list, it checks 
if, given this article’s and its language links’ edit history of 

~ 13 Source code: https://github.com/tomayac/postdoc/tree/ 
master/demos/disaster-monitor 


Disaster Monitor 

Current status: Monitoring 144190 candidate Wikipedia articles. 
Currently edited article: wikidata:Q17398090 
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the past 24 hours, the current edit event shows spiking be¬ 
havior, as outlined in Section [373] The core source code of 
the monitoring loop can be seen in Section [3] a screenshot 
of the application is shown in Figure [2] 


(functionO { 

// fired whenever an edit event happens on any Wikipedia 
var parseWikipediaEdit = function(data) { 
var article = data.language + + data.article; 

var disasterObj = monitoringList[article]; 

// the article is on the monitoring list 
if (disasterObj) { 

showCandidateArticle(data.article, data.language, 
disasterObj); 

} 

}; 


// fired whenever an article is on the monitoring list 
var showCandidateArticle = function(article, language, roles) { 
getGeoData(article, language, function(err, geoData) { 
getRevisionsData(article, language, function(err, 
revisionsData) { 
if (revisionsData.spiking) { 

// spiking article 

> 

if (geoData.averageCoordinates.lat) { 

// geo-referenced article, create map 

> 

// trigger alert if article is spiking 

}); 

}); 


getMonitoringList(seedArticle, function(err, data) { 

// get the initial monitoring list 

if (err) return console.log('Error^initializing^the^app.'); 
monitoringList = data; 

console, log ( Monitoring,^' + Object, keys (monitoringList) .length 

+ '^candidatej/i/ikipedia^articles. '); 


• en:Hurricane Gonzalo (spiking - ) 

o inbound link: 1/1 Extratropical cyclone 



Figure 2: Screenshot of the Disaster Monitor ap¬ 
plication prototype available at http://disaster- 
monitor. he rokuapp.com/ showing detected past disas¬ 
ters on a heatmap and a media gallery for a currently 
spiking disaster around “Hurricane Gonzalo” 


// start monitoring process once we have a monitoring list 
var wikiSource = new EventSource(wikipediaEdits); 
wikiSource.addEventListener( 'message' , function(e) { 
return parseWikipediaEdit(JSON.parse(e.data)); 

}); 

// auto-refresh monitoring list every hour 
setlnterval(function() { 

getMonitoringList(seedArticle, function(err, data) { 

if (err) return console.log( 'Error^refreshingjnonitoring, 
list.'); 

monitoringList = data; 

console.log( 'Monitoring^' + Object.keys(monitoringList). 
length + 

'^candidatej/i/ikipedia^articles . '); 

}); 

}, 1000 * 60 * 60 ); 

}); 

})(); 


Listing 3: Monitoring loop of the disaster monitor 






























<\protect\vrule widthOpt\protect\href{http://ex.org/disaster/en:Hurricane_Gonzalo}{http://ex.org/disaster/en:Hurricane_Gonzalo}> owl:sameAs 

"http://en.Wikipedia.org/wiki/Hurricane_Gonzalo", 

"http://live.dbpedia.org/page/Hurricane_Gonzalo", 

"http://www.freebase.com/m/0123kcg5"; 

ex:relatedMedialtems _:videol; 
ex:relatedMedialtems _:photol; 

_:videol ex:mediaUrl "https://mtc.cdn.vine.co/r/videos/82796227091134303173323251712_2ca88ba5444.5.1.16698738182474199804.mp4"; 
ex:micropostUrl "http://twitter.com/gpessoao/status/527603540860997632"; 

ex:posterUrl "https://v.cdn.vine.co/r/thumbs/231E0009CF1134303174572797952_2.5.1.16698738182474199804.mp4.jpg"; 

ex:publicationDate "2014-10-30T03:15:01Z"; 

ex:sociallnteractions [ ex:likes 1; ex:shares 0 ]; 

ex:timestamp 1414638901000; 

ex:type "video"; 

ex:userProfilellrl "http://twitter.com/alejandroriano"; 

ex:micropost [ 

ex: html "Here'sJHurricanejPGonzalo^as^seen^f rom i _ 1 the l _ 1 @Space_Station i _ i as i _ 1 it lJ orbited 1 _ j above l J:oday l _ J https ://t. co/RpJtOP2bXa"; 
ex:plainText "Here's^Hurricane^Gonzalo^as^seen^frorrMihe^Space-Station^as^it^orbited^above^today" ]. 

_:photol ex:mediaUrl "https://upload.wikimedia.org/wikipedia/commons/b/bb/Schiffsanleger_Wittenbergen_-_Orkan_Gonzalo.jpg"; 

ex:micropostUrl "https://commons.wikimedia.org/wiki/File:Schiffsanleger_Wittenbergen_-_Orkan_Gonzalo_(22.10.2014)_01.jpg"; 
ex:posterUrl "https://upload.wikimedia.org/wikipedia/commons/thumb/b/bb/Schiffsanleger_Wittenbergen_-_Orkan_Gonzalo_ 

%2822.10.2O14%29_01.j pg/500px-Schiffsanleger_Wittenbergen_-_Orkan_Gonzalo_(22.10.2014)_01.j pg" . 
ex:publicationDate "2014-10-24T08:40:16Z"; 
ex:sociallnteractions [ ex:shares 0 ]; 
ex:timestamp 1414140016000; 
ex:type "photo"; 

ex:userProfileUrl "https://commons.wikimedia.org/wiki/User:HuhuJJet"; 
ex:micropost [ 

ex: html "Schiffsanleger uJ Wittenbergen^-^0rkan uJ Gonzalo iJ (22.10.2014) 1 _ 1 01"; 
ex:plainText "Schiffsanleger tj Wittenbergen i _ i -^0rkan tj Gonzalo ij (22.10.2014)^01" ]. 


Listing 2: Exemplary Linked Data for Hurricane Gonzalo using a yet to-be-defined vocabulary (potentially HXL http://hxl. 
humanitarian response, info/ns/index, html or MOAC http: //observedchange. com/moac/ns/ ) that interlinks the disaster with 
several other Linked Data sources and relates it to multimedia content on online social networking sites 


4 Proposed Steps Toward an Evaluation 

We recall our core research questions that were Q1 How 
timely and accurate for the purpose of disaster detection and 
ongoing monitoring is content from Wikipedia, compared to 
authoritative sources mentioned above? and Q2 Does the 
disambiguated nature of Wikipedia surpass keyword-based 
disaster detection approaches, e.g., via online social net¬ 
working sites or search logs? Regarding Ql, only a manual 
comparison covering several months worth of disaster data 
of the relevant authoritative data sources mentioned in Sec- 
tion |l.l| with the output of our system can help respond to the 
question. Regarding Q2, we propose an evaluation strategy 
for the OSN site Twitter, loosely inspired by the approach 
of Sakaki et al. in ( jSakaki, Okazaki, and Matsuo 201Q| >. We 
choose Twitter as a data source due to thepublicly avail¬ 
able user data through its streaming APIsJ^J which would 
be considerably harder, if not impossible, with other OSNs 
or search logs due to privacy concerns and API limitations. 
Based on the articles in the “monitoring list”, we put forward 
using article titles as search terms, but without disambigua¬ 
tion hints in parentheses, e.g., instead of the complete article 
title “Typhoon Rammasun (2014)”, we suggest using “Ty¬ 
phoon Rammasun” alone. We advise monitoring the sample 


34 Twitter streaming APIs: https://dev.twitter.com/docs/ 
streaming-apis/streams/public 


strearrp^l for the appearance of any of the search terms, as 
the filtered s treaty 5 ] is too limited regarding the number of 
supported search terms. In order to avoid ambiguity issues 
with the international multi-language tweet stream, we rec¬ 
ommend matching search terms only if the Twitter-detected 
tweet language equals the search term’s language, e.g., En¬ 
glish, as in “Typhoon Rammasun”. 


5 Conclusions and Future Work 


In this paper, we have presented the first steps of our ongoing 
research on the creation of a Wikipedia-based disaster mon¬ 
itoring system. In particular, we finished its underlying code 
scaffolding and connected the system to several online so¬ 
cial networking sites allowing for the automatic generation 
of media galleries. Further, we propose to publish data about 
detected and monitored disasters as live queryable Linked 
Data, which can be made accessible in a scalable and ad 


hoc manner using triple pattern fragments (Verborgh et al. 
2014[> by levera ging free cloud hosting offers (Matteis and 
Verborgh 2014) . While the system itself already functions, 
a good chunk of work still lies ahead with the fine-tuning of 


35 Twitter sample stream: https://dev.twitter.com/docs/api/ 
1.1/get/statuses/sample 

JD Twitter filtered stream: https://dev.twitter.com/docs/api/ 
1.1/post/statuses/filter 


















its parameters. A first examples are the exponential smooth¬ 
ing parameters of the revision intervals, responsible for de¬ 
termining whether an article is spiking, and thus a potential 
new disaster, or not. A second example is the role that dis¬ 
asters play with articles: they can be inbound, outbound, or 
mutual links, and their importance for actual occurrences of 
disasters will vary. Future work will mainly focus on finding 
answers to our research questions Q1 and Q2 and the verifi¬ 
cation of the hypotheses H1-H3. We will focus on the eval¬ 
uation of the system’s usefulness, accuracy, and timeliness 
in comparison to other keyword-based approaches. An inter¬ 
esting aspect of our work is that the monitoring system is not 
limited to disasters. Using an analogous approach, we can 
monitor for human-made disasters (called “Anthropogenic 
hazard” on Wikipedia) like terrorism, war, power outages, 
air disasters, etc. We have created an exemplary “monitor¬ 
ing list” and made it available p 7 ] 

Concluding, we are excited about this research and look 
forward to putting the final system into operational practice 
in the weeks and months to come. Be safe! 
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